Malaria Detection¶

Executive Summary¶

Malaria is is one of oldest and deadliest diseases and has killed millions of people across history. A staggering 150 and 300 million lives in the 20th century [1], and 619,000 lives in 2021 [2], were taken by the disease. Modern day machine learning techniques, such as neural networks, can be used to diagnose the presence of infection in cells. Increasing diagnosis speed and accuracy this way will minimize the overdiagnosis of malaria, and allow for more attention to patients and families who are affected by the disease. This will reduce economic and social burden alongside increasing the overall effectiveness of doctors, especially in underserved areas where resources are limited.

The best model, a convolutional neural network used alongside data augmentation and regularization techniques, performs with 98.5% accuracy. The recall is 99% for parasitized instances, which means the model has minimized false negatives, which is ideal for the healthcare diagnosis context -- false negatives in malaria diagnosis are the worst case scenario. With just 542,514 parameters, the model features 3 convolutional layers for feature extraction, 2 MaxPooling layers, 3 dense layers, and a dropout layer located after the first dense layer in the model. The leaky ReLU activation function was used, with the exception of the first convolutional layer which used the hyperbolic tangent function. This means easier deployment, and on-the-ground training and model updates, in areas with fewer computing resources and more technological constraints.

Due to computing resource constraints, some larger models were not tested and experimented on in depth. Another pretrained model, besides the VGG16 model which was tested, might yield higher accuracy. Additionally, altering the architecture of the proposed model with different activation functions, and adding or removing feature extraction layers, may help boost performance. It may also be a matter of altering the learning rate parameter.

Furthermore, additional data augmentation strategies like gaussian blurring may be adopted as part of a slightly different pipeline. For example, the Albumentations module has more potential image transformations than the native tensorflow ImageDataGenerator module [3].

Stakeholders should pursue additional testing and training with this model, with more data from diverse sources, and boost accuracy even further. Tweaks should be tested on the model architecture. Additionally, this model should be tested in clinical scenarios with physicians verifying results. Stakeholders should be cautious in deploying and using this model and should ensure that a physician double checks results, especially in cases where the model is less confident in its classification. Stakeholders may also seek to use this model on personal devices, such as laptops and cellphones, to mitigate problems such as stolen equipment and unreliable electricity and power.

Problem and Solution Summary¶

Malaria is a large-scale health problem. As one of the oldest and deadliest diseases, it has killed millions of people across history. Claiming between 150 and 300 million lives in the 20th century [1], and 619,000 in 2021 [2], it comprises a significant share of humanity's death and disease burden. It may also be represented as a data science problem -- we are classifying diseased and uninfected instances on he basis of features present in our patients and images of cells. More specifically, the data science problem we are solving here is the issue of malaria classification based on visual features in images; a problem that convolutional neural networks (CNNs) are well suited for. Hence, a convolutional neural network was built here.

Several different models were tested, ranging from relatively simple to complex, with varying model architectures and data augmentation strategies. Broadly speaking, the models tested were an initial base model, a second model with more layers, a third model with batch normalization, a fourth model with data augmentation strategies and a dropout layer, and a final fourth model with the VGG16 pre-trained model. Details may be found in the full-code below. Some model variations and architectures were not tested due to computing and cost restraints, and should be tested more in depth as suggested in the recommendations section.

The third model was picked as the best model, and offers a 98.5% accuracy rate, with 99% recall for parasitized instances, meaning a high accuracy solution with a minimization of false negatives. This minimization of false negatives is highly favorable within the context of a health problem like malaria diagnosis -- the worst case scenario is to wrongfully determine an infected person as healthy. The F1 score for parasitized and uninfected instances is 99% and 98%, respectively. This means that overall, both false negatives and false positives are rare.

The constructed model features 3 convolutional layers for feature extraction, 2 MaxPooling layers, 3 dense layers, and a dropout layer located after the first dense layer in the model for regularization purposes. Additionally,the leaky ReLU activation function was used, with the exceptions of the first convolutional layer which used the hyperbolic tangent function, and the last dense layer which used a softmax function. This amounts to a relatively lightweight and fast solution with 542,514 parameters.

Practically speaking, this simplicity means that the model will take up less space on hard drives and run faster. The utility of this is amplified when we consider that this model will in all likelihood be used in developing countries, where malaria is not only more prevalent, but clinical workers are working with more technological and computing constraints, as well as potentially sparse internet access. Not only will this model be more accommodating to older and slower hardware, it will be easier to update on the ground. Data scientists, or perhaps clinicians with data science skills, will be able to more easily retrain and fine-tune a model like this one, with only 540 thousand parameters. Compare this to certain pre-trained models, like the VGG16 model tested here, which utilized approximately 15 million parameters. Not only is the proposed model smaller and faster than the VGG16 model, it is more accurate, with the VGG16 model accuracy at 96.2% versus the proposed model's 98.5%.

In terms of malaria diagnosis within the context of on-the-ground time and computing constraints, the proposed model will save physicians significant time and protect heavily against misdiagnosis, and perhaps inform physicians on which diagnoses should be revisited, perhaps with a more thorough procedure. This model can help save time, money, and human lives if applied by physicians dealing with malaria patients.

Recommendations for Implementation¶

This model should be used as a tool to enhance physicians' malaria diagnosis capabilities; it not meant to be a replacement for clinicians. Model diagnoses should be used to double check physician conclusions and vice versa. Cases where the model and physician disagree should be investigated more thoroughly, perhaps with more thorough diagnosis methodologies, with the physician making the final call as to a patient's status as infected or uninfected.

Stakeholders should see to it that the model is tested in real-world scenarios and validated by physicians. The model's lightweight nature allows for it to be easily shared with doctors and ran on a wide variety of machines that doctors may be using, in a plethora of contexts and conditions under which malaria diagnosis is required. As such it should be distributed to hospitals and clinics for their own use and testing, with their own unique data. Feedback should be sought out from those using this model, and the model should be iterated on accordingly. More data from a variety of sources should also be used to update this model's training. This model may also be used as a base model for further training -- the model's architecture and its parameters might be updated on-the-ground based on new data, which is again made easier by the model's quick and lightweight nature. An additional recommendation would be to adapt this model to run on a smartphone, such as an iPhone or Android.

As stated prior, this model's lightweight nature will allow it to run more easily in conditions where internet access, computing and technical resources, and human capital are limited. Its simplicity is its strength; it is faster and easier to run, takes up less space on hard drives, and is highly adaptable thanks to faster model re-training should on-the-ground clinicians see fit to do so.

The model's ability to make malaria diagnosis more reliable and accurate will have significant economic and quality-of-life effects in places where malaria is prevalent. In 2021, the WHO found that what they define as the African region carries 95% of malaria cases, and 96% of malaria deaths. Suprisingly, malaria is actually overdiagnosed [4]. This means that clinical material, doctors' time, and transportation costs are all wasted when malaria is diagnosed as a false positive -- all resources that can be used on those who are truly infected with malaria. In 2019, Haakenstad et al. found that global spending on Malaria from in both government and out-of-pocket expenditure amounted to $4.3 billion USD [5].

At present, it is not possible to determine how much of this money is overspent thanks to overdiagnosis of malaria. However, data from North-Eastern Tanzania, collected by Mosha et al., suggests a conservative 15 to 45% range for misdiagnosis [4]. Extrapolating this range globally suggests that the proposed model may save an estimated 0.65 to 1.9 billion USD annually. Distribution of this model will cost virtually nothing, as most clinics and hospitals will have the on-site resources to run this model, and doctors and lab technicians will easily be able to run the model with minimal instruction. There will be no need for cloud computing expenses and minimal need for hardware distribution in the cases where the model cannot be downloaded over the internet; all that would be needed in such a case is a small flash drive distributed to the hospital. As an overinflated estimate, we may assume at maximum such costs will amount to 100 thousand dollars.

Key risks and challenges include the possibility of technical difficulties on-the-ground with the model, such as data formatting issues. A weakness of this model is that it has been trained on one dataset with relatively high quality images, which is why it has been recommended to train the model on more data from a variety of sources. However, this weakness has been mitigated significantly through the data augmentation and regularization practices mentioned previously. Another challenge will be running this model in placed where electricity and power is unreliable -- for these situations, battery powered devices such as laptops and phones are recommended. Running the model on personal devices like these will also mitigate losses from stolen equipment, which is an unfortunate reality in some areas of developing countries. The model is strong overall, but it will need more exposure to real-world data to see how it performs in clinical contexts.

As mentioned prior, further analysis should include more training with more data, as well as double-checking results by physicians. Additionally, the model should be exposed to different strains of malaria in different stages. To further maximize the effectiveness of any particular dataset, the Albumentations data augmentation module may also be used, as it has significantly more potential image transformations than tensorflow's native ImageDataGenerator module [3]. For example, gaussian blurring may be used to augment image data. Different pre-trained models, besides VGG16, may also be tested. It may also be possible to significantly augment model performance with patient-specific data, although this proves to be a significantly more complex task and ventures deeper into the growing field of precision medicine. More general changes may be made to model architecture as well, by way of modifying things like activation functions and learning rates.

Overall, this model will prove useful to physicians on-the-ground, and will benefit the lives of millions, and allow governments, charity organizations, and research groups to more effectively allocate their resources by virtue of cutting down on waste expenses.

References:¶

[1] https://pubmed.ncbi.nlm.nih.gov/12364370/

[2] https://www.who.int/news-room/fact-sheets/detail/malaria

[3] https://albumentations.ai/

[4] https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0008707

[5] https://pubmed.ncbi.nlm.nih.gov/31036511/

Detailed Report and Code¶

Problem Definition

The context: Why is this problem important to solve?

  • Malaria is one of the oldest and deadliest diseases known to humanity. In 2021, it killed 619,000 people. It is primarily spread by mosquitoes, and is a parasitic infection predominantly found in Africa. As the WHO defines the African region, the African region in 2021 carried 95% of Malaria cases and 96% of Malaria deaths. Helping to solve this problem, and making diagnosis of infection easier, would make Malaria significantly less deadly, as in many cases it is possible to treat the infection. Reference: (https://www.who.int/news-room/fact-sheets/detail/malaria)

The objectives: What is the intended goal?

  • The goal with this project is to create a Malaria image classifier, looking at cells both infected and not infected by Malaria in order to determine which is which. A high accuracy, and especially a high recall-accuracy measure is desirable due to our goal of minimizing false negatives. It is better to get a false positive, rather than a false negative. Overall, we are trying to reduce the deadliness of malaria by making accurate detection easier, no matter what stage of development the parasitic infection is in.

The key questions: What are the key questions that need to be answered?

  • What are key features that malaria has that can distinguish between an infected and uninfected cell?
  • What is the most suitable method and neural network architecture to ensure maximization of accuracy and recall?
  • How can we balance computational constraints with the need for high accuracy? How complex must the model be, and how simple can the model be?

The problem formulation: What is it that we are trying to solve using data science?

  • We are trying to solve the problem of easy malaria detection with data science, to try and make it easier for doctors and clinicians to make confident decisions about the diagnosis of patients who potentially have Malaria. This model may also help doctors decide on diagnosis in the hard cases where only very subtle features and signs of infection are present. Data science will be a wonderful tool to apply to this problem, since diagnosis largely comes down to pattern recognition. The nature of this problem is extracting features from images and making decisions about the image, which is what data science tools can help us with.
  • More broadly speaking, we are trying to use data science to help save the lives of many people who may fall victim to malaria infection.

Data Description ¶

There are a total of 24,958 train and 2,600 test images (colored) that we have taken from microscopic images. These images are of the following categories:

Parasitized: The parasitized cells contain the Plasmodium parasite which causes malaria
Uninfected: The uninfected cells are free of the Plasmodium parasites

The dataset appears to come from a single source. The images are .png files, and are each approximately 6-16KB in size, with most hovering around 12KB. Pixel density is at 72ppi, and most images range around 120x120 pixels^2, although this does vary, and images are not necessarily square. These are images of parasitized and uninfected individual cells.

Mount the Drive

In [1]:
from google.colab import drive
path = '/content/drive'
drive.mount(path)
Mounted at /content/drive

Loading libraries¶

In [2]:
# for path and file-handling
import os
import zipfile

import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

# working with tensors and images
import tensorflow as tf
from PIL import Image

Let us load the data¶

Note:

  • You must download the dataset from the link provided on Olympus and upload the same to your Google Drive. Then unzip the folder.
In [3]:
path = '/content/drive/MyDrive/MIT ADSP Capstone/cell_images.zip'

with zipfile.ZipFile(path, 'r') as zip_data:
  zip_data.extractall()

folder_path = '/content/cell_images'

The extracted folder has different folders for train and test data will contain the different sizes of images for parasitized and uninfected cells within the respective folder name.

The size of all images must be the same and should be converted to 4D arrays so that they can be used as an input for the convolutional neural network. Also, we need to create the labels for both types of images to be able to train and test the model.

Let's do the same for the training data first and then we will use the same code for the test data as well.

In [4]:
from tensorflow.keras.utils import image_dataset_from_directory # for encoding and grabbing data from directory
In [5]:
# Define a constant image size:
IMG_SIZE = 64
In [6]:
# Create and use dataloaders to create sets for data analysis and exploration
# REF: https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory 

batch_size  = 128

train_set = image_dataset_from_directory(folder_path + '/train',
                                         subset='training',
                                         validation_split=0.2,
                                         seed=117,
                                         image_size=(IMG_SIZE, IMG_SIZE),
                                         batch_size=batch_size
                                         )

val_set = image_dataset_from_directory(folder_path + '/train',
                                         subset='validation',
                                         validation_split=0.2,
                                         seed=117,
                                         image_size=(IMG_SIZE, IMG_SIZE),
                                         batch_size=batch_size
                                         )

test_set = image_dataset_from_directory(folder_path + '/test',
                                         seed=117,
                                         image_size=(IMG_SIZE, IMG_SIZE),
                                         batch_size=batch_size
                                         )
Found 24958 files belonging to 2 classes.
Using 19967 files for training.
Found 24958 files belonging to 2 classes.
Using 4991 files for validation.
Found 2600 files belonging to 2 classes.
In [7]:
# Check the class names:
train_set.class_names
Out[7]:
['parasitized', 'uninfected']

From here, we can tell that 'parasitized' and 'uninfected' correspond to 0 and 1 labels, respectively.

Check the shape of train and test images

In [8]:
for image_batch, labels_batch in train_set:
  print("Shape of training set images:", image_batch.shape)
  break

for image_batch, labels_batch in test_set:
  print("Shape of test set images:", image_batch.shape)
  break
Shape of training set images: (128, 64, 64, 3)
Shape of test set images: (128, 64, 64, 3)

'''We can see there are 128 images in a batch, each (64 x 64 x 3).''' 64 x 64 pixels, with 3 RGB channels.

Check the shape of train and test labels

In [9]:
for image_batch, labels_batch in train_set:
  print("Shape of training set labels:", labels_batch.shape)
  break

for image_batch, labels_batch in test_set:
  print("Shape of test set labels:", labels_batch.shape)
  break
Shape of training set labels: (128,)
Shape of test set labels: (128,)

Observations and insights: 128 labels in a batch.

Check the minimum and maximum range of pixel values for train and test images

In [10]:
# Unbatch all sets of images, then convert to numpy, then to dataframe
train_set = train_set.unbatch()
val_set = val_set.unbatch()
test_set = test_set.unbatch()

# use as_numpy_iterator(), a method from tensorflow's dataset object to convert items into numpy arrays

train_set = list(train_set.as_numpy_iterator())
val_set = list(val_set.as_numpy_iterator())
test_set = list(test_set.as_numpy_iterator())

train_set = np.array(train_set, dtype=object)
val_set = np.array(val_set, dtype=object)
test_set = np.array(test_set, dtype=object)

# for ease of exploratory data analysis, we will use dataframes
train_df = pd.DataFrame(train_set)
val_df = pd.DataFrame(val_set)
test_df = pd.DataFrame(test_set)
In [11]:
# images are represented by pixel values ranging from 0-255, so set minimum at top of range, and maximum at bottom of range
# then replace minima and maxima according to what we find in the dataset.
min = 255
max = 0
for img in train_df[0]:
  temp_min = np.min(img)
  temp_max = np.max(img)

  if temp_min < min:
    min = temp_min

  if temp_max > max:
    max = temp_max

print("Min, Max for training dataset:", min, max)

min = 255
max = 0
for img in val_df[0]:
  temp_min = np.min(img)
  temp_max = np.max(img)

  if temp_min < min:
    min = temp_min

  if temp_max > max:
    max = temp_max

print("Min, Max for val dataset:", min, max)

min = 255
max = 0
for img in test_df[0]:
  temp_min = np.min(img)
  temp_max = np.max(img)

  if temp_min < min:
    min = temp_min

  if temp_max > max:
    max = temp_max

print("Min, Max for test dataset:", min, max)
Min, Max for training dataset: 0.0 255.0
Min, Max for val dataset: 0.0 255.0
Min, Max for test dataset: 0.0 255.0

We may group training and validation dataset for the purpose of the question.

The training dataset (min, max) is (0.0, 255.0).

The test dataset (min, max) is (0.0, 255.0)

Observations and insights: The maximum pixel value is 255, and the minimum pixel value 0. This is true across both training and validation sets.

Count the number of values in both uninfected and parasitized¶

Note:

  • 'Parasitized' labeled as 0
  • 'Uninfected' labeled as 1
In [12]:
# Where 0 is parasitized, and 1 is uninfected.
# sort_index() so we can see the values according to label clearly, instead of ranking by frequency. 
train_df[1].value_counts().sort_index()
Out[12]:
0    10102
1     9865
Name: 1, dtype: int64
In [13]:
val_df[1].value_counts().sort_index()
Out[13]:
0    2480
1    2511
Name: 1, dtype: int64
In [14]:
test_df[1].value_counts().sort_index()
Out[14]:
0    1300
1    1300
Name: 1, dtype: int64
In [15]:
count_check = len(train_df) + len(val_df) + len(test_df)
# if we have done everything right, this will evaluate to 27558 and match
# the sum of parasitized and infected across all datasets, and images in the folders
# as given by the problem statement (i.e. 24958 + 2600)

if count_check == (24958 + 2600):
  print("The number of images in all sets matches the number given by the problem statement.")
The number of images in all sets matches the number given by the problem statement.

In the training dataset: 10102 parasitized, 9865 infected.

In the validation dataset: 2480 parasitized, 2511 infected.

(Test + Validation dataset: 12582 parasitized, 12376 infected).

In the test dataset: 1300 parasitized, 1300 infected.

Normalize the images¶

In [16]:
# train_df[0] represents a series of all images
# train_df[1] represents a series of all labels, corresponding to those images.
# same pattern repeated for val_df and test_df
train_df[0] = train_df[0]/255.0
val_df[0] = val_df[0]/255.0 
test_df[0] = test_df[0]/255.0 

Observations and insights: Data has been normalized by dividing by 255.0.

Plot to check if the data is balanced¶

In [17]:
# put all labels in series variables for ease of use and clarity
s1 = train_df[1]
s2 = val_df[1]
s3 = test_df[1]

#get a count of uninfected and parasitized labels across all the data
label_0 = s1.value_counts()[0] + s2.value_counts()[0] + s3.value_counts()[0]
label_1 = s1.value_counts()[1] + s2.value_counts()[1] + s3.value_counts()[1]
In [18]:
# Create a dataframe with the labels and their counts
d = {'Uninfected' : label_1, 'Parasitized' : label_0} 
labels_df = pd.DataFrame(d, index=['Label Count'])
labels_df
Out[18]:
Uninfected Parasitized
Label Count 13676 13882
In [19]:
# plot
sns.barplot(labels_df)
plt.title('Classification Counts');

Observations and insights: The data is balanced. There are 13676 'uninfected' instances, and 13882 'parasitized' instances, so slightly more parasitized instances.

Data Exploration¶

Let's visualize the images from the train data

In [20]:
# For the sake of labeling and seeing on plots, create a list of the labels as strings

train_labels = pd.Series.map(train_df[1], {0: 'Parasitized', 1: 'Uninfected'})
val_labels = pd.Series.map(val_df[1], {0: 'Parasitized', 1: 'Uninfected'})
test_labels = pd.Series.map(test_df[1], {0: 'Parasitized', 1: 'Uninfected'})

Observations and insights: We have now created string label lists for each dataset.

Visualize the images with subplot(6, 6) and figsize = (12, 12)¶

Please note: due to the constraints of the instructions, labels above the bottom row are cut off. Another plot has been included below the first, adjusted for the labels.

In [21]:
plt.figure(figsize=(12, 12))
for i in range(36):
  plt.subplot(6, 6, i+1)
  plt.imshow(train_df[0][i])
  plt.xlabel(train_labels[i])

plt.show()
In [22]:
plt.figure(figsize=(13, 15)) # adjusted for the labels to show in the plot
for i in range(36):
  plt.subplot(6, 6, i+1)
  plt.imshow(train_df[0][i])
  plt.xlabel(train_labels[i])

plt.show()

Observations and insights: Parasitized images have distinct discoloration and features. Uninfected images look more featureless, more "monotone" and even-colored. Parasitized images also generally have more edges, and are more misshapen, where the uninfected images are generally rounder and smoother in terms of edges.

Plotting the mean images for parasitized and uninfected¶

In [23]:
def plot_mean_img(img_arr1, img_arr2, img_arr3, label = "Label"):
  '''A function for concatenating and finding the mean image over 3 image arrays.
  Three arguments, img_arr1 to image_arr3, and an optional "label" parameter to show
  which set is being plotted.'''

  imgs = pd.concat([img_arr1, img_arr2, img_arr3]) # concatenate the image arrays
  m = np.mean(imgs) # generate the mean image
  # plot
  plt.imshow(m)
  plt.xlabel(label)
In [24]:
# separate out uninfected and parasitized instances, take subsets of dataframe based on condition of label value
train_df_uninfected = train_df.loc[train_df[1] == 1][0]
train_df_parasitized = train_df.loc[train_df[1] == 0][0]
val_df_uninfected = val_df.loc[val_df[1] == 1][0]
val_df_parasitized = val_df.loc[val_df[1] == 0][0]
test_df_uninfected = test_df.loc[test_df[1] == 1][0]
test_df_parasitized = test_df.loc[test_df[1] == 0][0]

Mean image for parasitized

In [25]:
plot_mean_img(train_df_parasitized, val_df_parasitized, test_df_parasitized, 'Parasitized')

Mean image for uninfected

In [26]:
plot_mean_img(train_df_uninfected, val_df_uninfected, test_df_uninfected, 'Uninfected')

Observations and insights: The average uninfected and parasitized images do not look very different to the untrained human eye, although the parasitized average does look slightly redder in color.

Converting RGB to HSV of Images using OpenCV

Converting the train data

In [27]:
import cv2
# REF: https://www.tutorialspoint.com/how-to-convert-an-rgb-image-to-hsv-image-using-opencv-python 
In [28]:
# note that training data was initially split into test and validation set
# by tensorflow's image dataloader function

train_hsv_list = [] # an array for the hsv images
for img in train_df[0]: # loop through pandas series image by image and convert
  hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) 
  train_hsv_list.append(hsv_img)

val_hsv_list = []
for img in val_df[0]:
  hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
  val_hsv_list.append(hsv_img)

#print(len(train_hsv_list), len(val_hsv_list)) # to verify that no image was missed, should be 19967 and 4991
In [29]:
plt.figure(figsize=(8, 9)) # adjusted for the labels to show in the plot
for i in range(9):
  plt.subplot(3, 3, i+1)
  plt.imshow(train_hsv_list[i])
  plt.xlabel(train_labels[i])

plt.show()
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

Converting the test data

In [30]:
test_hsv_list = []
for img in test_df[0]:
  hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
  test_hsv_list.append(hsv_img)

# print(len(test_hsv_list)) # to verify that no image was missed, should be 2600
In [31]:
plt.figure(figsize=(8, 9)) # adjusted for the labels to show in the plot
for i in range(9):
  plt.subplot(3, 3, i+1)
  plt.imshow(test_hsv_list[i])
  plt.xlabel(test_labels[i])

plt.show()
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
WARNING:matplotlib.image:Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

Observations and insights: HSV images have more aggressive color change, notable in the parasitized cells especially. Visual markers for parasitic infection are easier to see (greater contrast between discolored feature and pink/magenta background or average coloration of the cell).

Processing Images using Gaussian Blurring¶

Gaussian Blurring on train data

In [32]:
plt.figure(figsize=(12, 10)) # adjusted for the labels to show in the plot
for i in range(9):
  plt.subplot(3, 3, i+1)
  sigmaX = np.random.randint(0, 11) # randomly pick a sigmaX between 0 and 10
  plt.imshow(cv2.GaussianBlur(src=train_df[0][i], ksize=(3 ,3), sigmaX=sigmaX))
  plt.xlabel(train_labels[i] + ", sigmaX = " + str(sigmaX))

plt.show()
In [33]:
plt.figure(figsize=(12, 10)) # adjusted for the labels to show in the plot
for i in range(9):
  plt.subplot(3, 3, i+1)
  k = np.random.randint(0, 2) # randomly pick a ksize
  if k == 0:
    ksize = (3, 3)
  else:
    ksize = (5, 5)
  plt.imshow(cv2.GaussianBlur(src=train_df[0][i], ksize=ksize, sigmaX=0))
  plt.xlabel(train_labels[i] + ", ksize = " + str(ksize))

plt.show()

Gaussian Blurring on test data

In [34]:
plt.figure(figsize=(12, 10)) # adjusted for the labels to show in the plot
for i in range(9):
  plt.subplot(3, 3, i+1)
  sigmaX = np.random.randint(0, 11)
  plt.imshow(cv2.GaussianBlur(src=test_df[0][i], ksize=(3 ,3), sigmaX=sigmaX))
  plt.xlabel(test_labels[i] + ", sigmaX = " + str(sigmaX))

plt.show()
In [35]:
plt.figure(figsize=(12, 10)) # adjusted for the labels to show in the plot
for i in range(9):
  plt.subplot(3, 3, i+1)
  k = np.random.randint(0, 2)
  if k == 0:
    ksize = (3, 3)
  else:
    ksize = (5, 5)
  plt.imshow(cv2.GaussianBlur(src=test_df[0][i], ksize=ksize, sigmaX=0))
  plt.xlabel(test_labels[i] + ", ksize = " + str(ksize))

plt.show()

Observations and insights: Gaussian blurring makes the image blurrier. A higher sigmaX value appears to make the image slightly blurrier, but it is difficult to tell with human eyes. The sigmaX value is responsible for defining the standard deviation across the X axis, and when sigmaY is left undefined it is taken as the same as sigmaX.

A higher kernel size is associated with more blurring/smoothing, where a higher variance (sigmaX) is also associated with the more blur due to greater emphasis on pixels farther from the center in the kernel window.

REFS:

https://dsp.stackexchange.com/questions/10057/gaussian-blur-standard-deviation-radius-and-kernel-size

https://hackaday.com/2021/07/21/what-exactly-is-a-gaussian-blur/

Think About It: Would blurring help us for this problem statement in any way? What else can we try?

Yes, blurring will help with this problem because it will help control for lower resolution images that may appear in real-world applications. We may also try things like image rotations, truncation, distortions, discoloration, and any number of other data augmentation techniques for images.

Model Building¶

Base Model¶

Importing the required libraries for building and training our Model

In [36]:
from tensorflow.keras.models import Sequential, Model # sequential api for sequential model
from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, Rescaling
from tensorflow.keras.layers import BatchNormalization, Activation, Input, LeakyReLU
from tensorflow.keras import backend, losses, optimizers
from tensorflow.keras.optimizers import RMSprop, Adam, SGD # optimizers for modeling
from sklearn.metrics import confusion_matrix, classification_report
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
In [37]:
# Clearing the backend and setting random number seeds.
# this is done to clear up the backend from previous runs as the model is modified.
from tensorflow.keras import backend
import random

def reset_session(random_seed=117):
  '''A function to reset the session. Sets numpy, random, and tensorflows random seeds to the given parameter, or 117 by default.'''
  np.random.seed(random_seed)
  random.seed(random_seed)
  tf.random.set_seed(random_seed)

  backend.clear_session()

reset_session()
In [38]:
#from pandas.core.arrays.timedeltas import validate_endpoints
# reimport the data for ease in formatting and optimization
# Create and use dataloaders to create sets and pass to neural networks
# REF: https://www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory 

batch_size = 128
IMG_SIZE = 64

train_ds = image_dataset_from_directory(folder_path + '/train',
                                         validation_split=0.2,
                                         subset='training',
                                         seed=117,
                                         image_size=(IMG_SIZE, IMG_SIZE),
                                         batch_size=batch_size,
                                         label_mode='int'
                                         )
val_ds = image_dataset_from_directory(folder_path + '/train',
                                         validation_split=0.2,
                                         subset='validation',
                                         seed=117,
                                         image_size=(IMG_SIZE, IMG_SIZE),
                                         batch_size=batch_size,
                                         label_mode='int'
                                         )


test_ds = image_dataset_from_directory(folder_path + '/test',
                                         seed=117,
                                         image_size=(IMG_SIZE, IMG_SIZE),
                                         batch_size=batch_size,
                                         label_mode='int'
                                         )
Found 24958 files belonging to 2 classes.
Using 19967 files for training.
Found 24958 files belonging to 2 classes.
Using 4991 files for validation.
Found 2600 files belonging to 2 classes.
In [39]:
# get all images and labels from the datasets in arrays

# training
train_ds_unbatched = train_ds.unbatch()
X_train = []
y_train = []
for image_batch, labels_batch in train_ds_unbatched:
  X_train.append(image_batch.numpy())
  y_train.append(labels_batch.numpy())

X_train = np.asarray(X_train).astype('float32')
y_train = np.asarray(y_train).astype('float32')

# validation
val_ds_unbatched = val_ds.unbatch()
X_val = []
y_val = []
for image_batch, labels_batch in val_ds_unbatched:
  X_val.append(image_batch.numpy())
  y_val.append(labels_batch.numpy())

X_val = np.asarray(X_val).astype('float32')
y_val = np.asarray(y_val).astype('float32')

# test
test_ds_unbatched = test_ds.unbatch()
X_test = []
y_test = []
for image_batch, labels_batch in test_ds_unbatched:
  X_test.append(image_batch.numpy())
  y_test.append(labels_batch.numpy())

X_test = np.asarray(X_test).astype('float32')
y_test = np.asarray(y_test).astype('float32')
In [40]:
# use tf's AUTOTUNE function for performance optimization
# REF: https://www.tensorflow.org/guide/data_performance
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(500).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

One Hot Encoding the train and test labels

In [41]:
# encoding is taken care of already in the loading of the data,
# in cell 39, using the image_dataset_from_directory method in tensorflow's library
# and again 

# however, we could have one hot encoded with tensorflow's to_categorical function.

Building the model

In [42]:
#initialize a sequential model
model0 = Sequential([ 
    Rescaling(1./255, input_shape=(IMG_SIZE, IMG_SIZE, 3)), #use this layer for normalization of freshly imported dataset
    Conv2D(16, 3, padding='same', activation='relu'),
    MaxPooling2D((2, 2)),
    Conv2D(32, 3, padding='same', activation='relu'),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(2, activation='softmax') # set this to 2 because we have 2 classes
])

Compiling the model

In [43]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
model0.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model0.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 rescaling (Rescaling)       (None, 64, 64, 3)         0         
                                                                 
 conv2d (Conv2D)             (None, 64, 64, 16)        448       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 32, 32, 16)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        4640      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 16, 16, 32)       0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 8192)              0         
                                                                 
 dense (Dense)               (None, 128)               1048704   
                                                                 
 dense_1 (Dense)             (None, 64)                8256      
                                                                 
 dense_2 (Dense)             (None, 2)                 130       
                                                                 
=================================================================
Total params: 1,062,178
Trainable params: 1,062,178
Non-trainable params: 0
_________________________________________________________________

Using Callbacks

In [44]:
# create model checkpoint, filepath
model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model0_checkpoints/model0-{epoch:02d}-{val_accuracy:.4f}.hdf5'
callbacks = [
    EarlyStopping(monitor='val_loss', mode='min', patience=5, verbose=1),
    ModelCheckpoint(model_path, monitor='val_accuracy', mode='max', save_best_only=True, verbose=1)
]

Fit and train our Model

In [45]:
epochs = 15
history_0 = model0.fit(
    train_ds, 
    validation_data=val_ds,
    epochs=epochs, 
    callbacks=callbacks
)
Epoch 1/15
147/156 [===========================>..] - ETA: 0s - loss: 0.6311 - accuracy: 0.6393
Epoch 1: val_accuracy improved from -inf to 0.71328, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model0_checkpoints/model0-01-0.7133.hdf5
156/156 [==============================] - 20s 22ms/step - loss: 0.6293 - accuracy: 0.6407 - val_loss: 0.5557 - val_accuracy: 0.7133
Epoch 2/15
149/156 [===========================>..] - ETA: 0s - loss: 0.4965 - accuracy: 0.7591
Epoch 2: val_accuracy improved from 0.71328 to 0.85414, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model0_checkpoints/model0-02-0.8541.hdf5
156/156 [==============================] - 1s 7ms/step - loss: 0.4912 - accuracy: 0.7624 - val_loss: 0.3710 - val_accuracy: 0.8541
Epoch 3/15
151/156 [============================>.] - ETA: 0s - loss: 0.2357 - accuracy: 0.9082
Epoch 3: val_accuracy improved from 0.85414 to 0.94109, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model0_checkpoints/model0-03-0.9411.hdf5
156/156 [==============================] - 1s 7ms/step - loss: 0.2340 - accuracy: 0.9089 - val_loss: 0.1523 - val_accuracy: 0.9411
Epoch 4/15
148/156 [===========================>..] - ETA: 0s - loss: 0.1414 - accuracy: 0.9494
Epoch 4: val_accuracy improved from 0.94109 to 0.95652, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model0_checkpoints/model0-04-0.9565.hdf5
156/156 [==============================] - 1s 7ms/step - loss: 0.1407 - accuracy: 0.9495 - val_loss: 0.1171 - val_accuracy: 0.9565
Epoch 5/15
151/156 [============================>.] - ETA: 0s - loss: 0.1015 - accuracy: 0.9655
Epoch 5: val_accuracy did not improve from 0.95652
156/156 [==============================] - 1s 6ms/step - loss: 0.1020 - accuracy: 0.9653 - val_loss: 0.1412 - val_accuracy: 0.9503
Epoch 6/15
150/156 [===========================>..] - ETA: 0s - loss: 0.0782 - accuracy: 0.9747
Epoch 6: val_accuracy improved from 0.95652 to 0.96233, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model0_checkpoints/model0-06-0.9623.hdf5
156/156 [==============================] - 2s 13ms/step - loss: 0.0780 - accuracy: 0.9747 - val_loss: 0.1015 - val_accuracy: 0.9623
Epoch 7/15
149/156 [===========================>..] - ETA: 0s - loss: 0.0651 - accuracy: 0.9789
Epoch 7: val_accuracy improved from 0.96233 to 0.96714, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model0_checkpoints/model0-07-0.9671.hdf5
156/156 [==============================] - 1s 7ms/step - loss: 0.0647 - accuracy: 0.9788 - val_loss: 0.0994 - val_accuracy: 0.9671
Epoch 8/15
151/156 [============================>.] - ETA: 0s - loss: 0.0485 - accuracy: 0.9840
Epoch 8: val_accuracy did not improve from 0.96714
156/156 [==============================] - 1s 6ms/step - loss: 0.0488 - accuracy: 0.9840 - val_loss: 0.1090 - val_accuracy: 0.9653
Epoch 9/15
151/156 [============================>.] - ETA: 0s - loss: 0.0368 - accuracy: 0.9884
Epoch 9: val_accuracy did not improve from 0.96714
156/156 [==============================] - 1s 6ms/step - loss: 0.0367 - accuracy: 0.9884 - val_loss: 0.1025 - val_accuracy: 0.9641
Epoch 10/15
149/156 [===========================>..] - ETA: 0s - loss: 0.0298 - accuracy: 0.9906
Epoch 10: val_accuracy did not improve from 0.96714
156/156 [==============================] - 1s 7ms/step - loss: 0.0293 - accuracy: 0.9907 - val_loss: 0.1093 - val_accuracy: 0.9657
Epoch 11/15
148/156 [===========================>..] - ETA: 0s - loss: 0.0248 - accuracy: 0.9919
Epoch 11: val_accuracy did not improve from 0.96714
156/156 [==============================] - 1s 6ms/step - loss: 0.0243 - accuracy: 0.9921 - val_loss: 0.1162 - val_accuracy: 0.9661
Epoch 12/15
148/156 [===========================>..] - ETA: 0s - loss: 0.0165 - accuracy: 0.9948
Epoch 12: val_accuracy did not improve from 0.96714
156/156 [==============================] - 1s 6ms/step - loss: 0.0172 - accuracy: 0.9945 - val_loss: 0.1218 - val_accuracy: 0.9605
Epoch 12: early stopping

Evaluating the model on test data

In [122]:
loss0, accuracy0 = model0.evaluate(test_ds, batch_size=128, verbose=2)
print(accuracy0)
21/21 - 1s - loss: 0.0970 - accuracy: 0.9665 - 753ms/epoch - 36ms/step
0.9665384888648987

Plotting the confusion matrix

In [47]:
y_predictions = model0.predict(X_test, batch_size=128, verbose=2)
y_predictions = np.argmax(y_predictions, axis=1) # need to get the highest probability indices along axis 1 (row axis)
21/21 - 0s - 193ms/epoch - 9ms/step
In [48]:
conf_matrix_model0 = tf.math.confusion_matrix(y_test, y_predictions)
plt.figure(figsize=(12, 8))
xlabels = ['Parasitized', 'Uninfected']
ylabels = ['Parasitized', 'Uninfected']
sns.heatmap(conf_matrix_model0, annot=True, fmt= '.0f', xticklabels=xlabels, yticklabels=ylabels)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()

Plotting the train and validation curves

In [49]:
plt.plot(history_0.history['accuracy'])
plt.plot(history_0.history['val_accuracy'])

plt.legend(['Train', 'Validation'])

plt.title('Accuracy vs. Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')

plt.show()

It appears we have slight overfitting; regularization techniques may help with this, such as data augmentation and dropout layers.

So now let's try to build another model with few more add on layers and try to check if we can try to improve the model. Therefore try to build a model by adding few layers if required and altering the activation functions.

Model 1¶

Trying to improve the performance of our model by adding new layers

Building the Model

In [50]:
# first, reset the backend:
reset_session()

model1 = Sequential([ 
    Rescaling(1./255, input_shape=(IMG_SIZE, IMG_SIZE, 3)), #use this layer for normalization of freshly imported dataset
    Conv2D(16, 3, padding='same', activation='tanh'),
    MaxPooling2D((2, 2)),
    Conv2D(32, 3, padding='same', activation='relu'),
    MaxPooling2D((2, 2)),
    Conv2D(16, 3, padding='same', activation='relu'),
    Flatten(), # flatten before passing data onto fully connected layers
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(2, activation='softmax') # set this to 2 because we have 2 classes
])

In this model, we have changed:

  • 1st Conv2D layer's activation function to 'tanh' from 'relu'
  • Added a 3rd Conv2D layer with 16 filters, after the second MaxPooling2D layer

Compiling the model

In [51]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
model1.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model1.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 rescaling (Rescaling)       (None, 64, 64, 3)         0         
                                                                 
 conv2d (Conv2D)             (None, 64, 64, 16)        448       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 32, 32, 16)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        4640      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 16, 16, 32)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 16, 16, 16)        4624      
                                                                 
 flatten (Flatten)           (None, 4096)              0         
                                                                 
 dense (Dense)               (None, 128)               524416    
                                                                 
 dense_1 (Dense)             (None, 64)                8256      
                                                                 
 dense_2 (Dense)             (None, 2)                 130       
                                                                 
=================================================================
Total params: 542,514
Trainable params: 542,514
Non-trainable params: 0
_________________________________________________________________

Using Callbacks

In [52]:
# create model checkpoint filepath
# model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1-{epoch:02d}-{val_accuracy:.4f}.hdf5'
model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5'
callbacks = [
    EarlyStopping(monitor='val_loss', mode='min', patience=5, verbose=1),
    ModelCheckpoint(model_path, monitor='val_accuracy', mode='max', save_best_only=True, verbose=1)
]

Fit and Train the model

In [53]:
epochs = 15
history_1 = model1.fit(
    train_ds, 
    validation_data=val_ds,
    epochs=epochs, 
    callbacks=callbacks
)
Epoch 1/15
153/156 [============================>.] - ETA: 0s - loss: 0.6239 - accuracy: 0.6494
Epoch 1: val_accuracy improved from -inf to 0.79223, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5
156/156 [==============================] - 4s 16ms/step - loss: 0.6222 - accuracy: 0.6516 - val_loss: 0.4970 - val_accuracy: 0.7922
Epoch 2/15
156/156 [==============================] - ETA: 0s - loss: 0.3015 - accuracy: 0.8786
Epoch 2: val_accuracy improved from 0.79223 to 0.92527, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5
156/156 [==============================] - 1s 7ms/step - loss: 0.3015 - accuracy: 0.8786 - val_loss: 0.1899 - val_accuracy: 0.9253
Epoch 3/15
156/156 [==============================] - ETA: 0s - loss: 0.1416 - accuracy: 0.9484
Epoch 3: val_accuracy improved from 0.92527 to 0.95712, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5
156/156 [==============================] - 1s 7ms/step - loss: 0.1416 - accuracy: 0.9484 - val_loss: 0.1267 - val_accuracy: 0.9571
Epoch 4/15
153/156 [============================>.] - ETA: 0s - loss: 0.0947 - accuracy: 0.9677
Epoch 4: val_accuracy improved from 0.95712 to 0.96874, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5
156/156 [==============================] - 1s 7ms/step - loss: 0.0952 - accuracy: 0.9675 - val_loss: 0.0966 - val_accuracy: 0.9687
Epoch 5/15
151/156 [============================>.] - ETA: 0s - loss: 0.0686 - accuracy: 0.9760
Epoch 5: val_accuracy improved from 0.96874 to 0.97536, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5
156/156 [==============================] - 1s 7ms/step - loss: 0.0680 - accuracy: 0.9763 - val_loss: 0.0846 - val_accuracy: 0.9754
Epoch 6/15
149/156 [===========================>..] - ETA: 0s - loss: 0.0626 - accuracy: 0.9792
Epoch 6: val_accuracy improved from 0.97536 to 0.97636, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5
156/156 [==============================] - 1s 7ms/step - loss: 0.0621 - accuracy: 0.9794 - val_loss: 0.0851 - val_accuracy: 0.9764
Epoch 7/15
148/156 [===========================>..] - ETA: 0s - loss: 0.0460 - accuracy: 0.9845
Epoch 7: val_accuracy improved from 0.97636 to 0.97716, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model1_checkpoints/model1.hdf5
156/156 [==============================] - 1s 7ms/step - loss: 0.0457 - accuracy: 0.9845 - val_loss: 0.0793 - val_accuracy: 0.9772
Epoch 8/15
148/156 [===========================>..] - ETA: 0s - loss: 0.0380 - accuracy: 0.9871
Epoch 8: val_accuracy did not improve from 0.97716
156/156 [==============================] - 1s 7ms/step - loss: 0.0380 - accuracy: 0.9872 - val_loss: 0.0868 - val_accuracy: 0.9742
Epoch 9/15
154/156 [============================>.] - ETA: 0s - loss: 0.0304 - accuracy: 0.9897
Epoch 9: val_accuracy did not improve from 0.97716
156/156 [==============================] - 1s 6ms/step - loss: 0.0303 - accuracy: 0.9897 - val_loss: 0.0931 - val_accuracy: 0.9740
Epoch 10/15
148/156 [===========================>..] - ETA: 0s - loss: 0.0252 - accuracy: 0.9919
Epoch 10: val_accuracy did not improve from 0.97716
156/156 [==============================] - 1s 7ms/step - loss: 0.0250 - accuracy: 0.9920 - val_loss: 0.0895 - val_accuracy: 0.9750
Epoch 11/15
155/156 [============================>.] - ETA: 0s - loss: 0.0174 - accuracy: 0.9949
Epoch 11: val_accuracy did not improve from 0.97716
156/156 [==============================] - 1s 7ms/step - loss: 0.0173 - accuracy: 0.9949 - val_loss: 0.1214 - val_accuracy: 0.9734
Epoch 12/15
152/156 [============================>.] - ETA: 0s - loss: 0.0144 - accuracy: 0.9954
Epoch 12: val_accuracy did not improve from 0.97716
156/156 [==============================] - 1s 7ms/step - loss: 0.0143 - accuracy: 0.9953 - val_loss: 0.1029 - val_accuracy: 0.9742
Epoch 12: early stopping

Evaluating the model

In [123]:
loss1, accuracy1 = model1.evaluate(test_ds, batch_size=128, verbose=2)
print(accuracy1)
21/21 - 1s - loss: 0.0809 - accuracy: 0.9738 - 1s/epoch - 61ms/step
0.9738461375236511

Plotting the confusion matrix

In [55]:
y_predictions = model1.predict(X_test, batch_size=128, verbose=2)
y_predictions = np.argmax(y_predictions, axis=1) # need to get the highest probability indices along axis 1 (row axis)
len(y_predictions) # will get 2600

conf_matrix_model1 = tf.math.confusion_matrix(y_test, y_predictions)
plt.figure(figsize=(12, 8))
xlabels = ['Parasitized', 'Uninfected']
ylabels = ['Parasitized', 'Uninfected']
sns.heatmap(conf_matrix_model1, annot=True, fmt= '.0f', xticklabels=xlabels, yticklabels=ylabels)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
21/21 - 0s - 154ms/epoch - 7ms/step

Plotting the train and the validation curves

In [56]:
plt.plot(history_1.history['accuracy'])
plt.plot(history_1.history['val_accuracy'])

plt.legend(['Train', 'Validation'])

plt.title('Accuracy vs. Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')

plt.show()

Think about it: Now let's build a model with LeakyRelu as the activation function

  • Can the model performance be improved if we change our activation function to LeakyRelu?
    • Yes, the model performance may be improved by adding a LeakyReLU activation function, since LeakyReLU is differentiable over the entire domain, including 0 (unlike normal ReLU). LeakyReLU may be able to converge to a solution faster due to this fact, increasing model performance.
  • Can BatchNormalization improve our model?
    • Yes, BatchNormalization has the potential to improve the model due to its normalizing of inputs to successive layers, and may also provide some regularization effects to the data, reducing overfitting.

Let us try to build a model using BatchNormalization and using LeakyRelu as our activation function.

Model 2 with Batch Normalization¶

In [57]:
# first, reset the backend:
reset_session()

Building the Model

In [58]:
model2 = Sequential([ 
    Rescaling(1./255, input_shape=(IMG_SIZE, IMG_SIZE, 3)), #use this layer for normalization of freshly imported dataset
    Conv2D(16, 3, padding='same'),
    LeakyReLU(0.2),
    MaxPooling2D((2, 2)),
    Conv2D(32, 3, padding='same'),
    LeakyReLU(0.2),
    MaxPooling2D((2, 2)),
    Conv2D(16, 3, padding='same'),
    LeakyReLU(0.2),
    Flatten(),
    Dense(128),
    LeakyReLU(0.2),
    Dense(64),
    LeakyReLU(0.2),
    BatchNormalization(),
    Dense(2, activation='softmax') # set this to 2 because we have 2 classes
])

Compiling the model

In [59]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
model2.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model2.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 rescaling (Rescaling)       (None, 64, 64, 3)         0         
                                                                 
 conv2d (Conv2D)             (None, 64, 64, 16)        448       
                                                                 
 leaky_re_lu (LeakyReLU)     (None, 64, 64, 16)        0         
                                                                 
 max_pooling2d (MaxPooling2D  (None, 32, 32, 16)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        4640      
                                                                 
 leaky_re_lu_1 (LeakyReLU)   (None, 32, 32, 32)        0         
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 16, 16, 32)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 16, 16, 16)        4624      
                                                                 
 leaky_re_lu_2 (LeakyReLU)   (None, 16, 16, 16)        0         
                                                                 
 flatten (Flatten)           (None, 4096)              0         
                                                                 
 dense (Dense)               (None, 128)               524416    
                                                                 
 leaky_re_lu_3 (LeakyReLU)   (None, 128)               0         
                                                                 
 dense_1 (Dense)             (None, 64)                8256      
                                                                 
 leaky_re_lu_4 (LeakyReLU)   (None, 64)                0         
                                                                 
 batch_normalization (BatchN  (None, 64)               256       
 ormalization)                                                   
                                                                 
 dense_2 (Dense)             (None, 2)                 130       
                                                                 
=================================================================
Total params: 542,770
Trainable params: 542,642
Non-trainable params: 128
_________________________________________________________________

Using callbacks

In [60]:
# create model checkpoint, filepath
# model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model2_checkpoints/model2-{epoch:02d}-{val_accuracy:.4f}.hdf5'
model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model2_checkpoints/model2.hdf5'
callbacks = [
    EarlyStopping(monitor='val_loss', mode='min', patience=5, verbose=1),
    ModelCheckpoint(model_path, monitor='val_accuracy', mode='max', save_best_only=True, verbose=1)
]

Fit and train the model

In [61]:
epochs = 15
history_2 = model2.fit(
    train_ds, 
    validation_data=val_ds,
    epochs=epochs, 
    callbacks=callbacks
)
Epoch 1/15
152/156 [============================>.] - ETA: 0s - loss: 0.5385 - accuracy: 0.7345
Epoch 1: val_accuracy improved from -inf to 0.49689, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model2_checkpoints/model2.hdf5
156/156 [==============================] - 5s 16ms/step - loss: 0.5369 - accuracy: 0.7358 - val_loss: 1.0926 - val_accuracy: 0.4969
Epoch 2/15
153/156 [============================>.] - ETA: 0s - loss: 0.3584 - accuracy: 0.8445
Epoch 2: val_accuracy improved from 0.49689 to 0.89641, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model2_checkpoints/model2.hdf5
156/156 [==============================] - 1s 7ms/step - loss: 0.3570 - accuracy: 0.8449 - val_loss: 0.3857 - val_accuracy: 0.8964
Epoch 3/15
150/156 [===========================>..] - ETA: 0s - loss: 0.2630 - accuracy: 0.8955
Epoch 3: val_accuracy did not improve from 0.89641
156/156 [==============================] - 1s 7ms/step - loss: 0.2634 - accuracy: 0.8955 - val_loss: 7.0731 - val_accuracy: 0.4969
Epoch 4/15
150/156 [===========================>..] - ETA: 0s - loss: 0.2238 - accuracy: 0.9143
Epoch 4: val_accuracy did not improve from 0.89641
156/156 [==============================] - 1s 7ms/step - loss: 0.2222 - accuracy: 0.9146 - val_loss: 2.1897 - val_accuracy: 0.5923
Epoch 5/15
153/156 [============================>.] - ETA: 0s - loss: 0.1562 - accuracy: 0.9421
Epoch 5: val_accuracy did not improve from 0.89641
156/156 [==============================] - 1s 7ms/step - loss: 0.1558 - accuracy: 0.9422 - val_loss: 0.4959 - val_accuracy: 0.8567
Epoch 6/15
153/156 [============================>.] - ETA: 0s - loss: 0.1262 - accuracy: 0.9538
Epoch 6: val_accuracy did not improve from 0.89641
156/156 [==============================] - 1s 7ms/step - loss: 0.1273 - accuracy: 0.9534 - val_loss: 0.4944 - val_accuracy: 0.8431
Epoch 7/15
152/156 [============================>.] - ETA: 0s - loss: 0.1102 - accuracy: 0.9617
Epoch 7: val_accuracy did not improve from 0.89641
156/156 [==============================] - 1s 7ms/step - loss: 0.1105 - accuracy: 0.9614 - val_loss: 3.7523 - val_accuracy: 0.5063
Epoch 7: early stopping

Plotting the train and validation accuracy

In [62]:
plt.plot(history_2.history['accuracy'])
plt.plot(history_2.history['val_accuracy'])

plt.legend(['Train', 'Validation'])

plt.title('Accuracy vs. Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')

plt.show()

Evaluating the model

In [124]:
loss2, accuracy2 = model2.evaluate(test_ds, batch_size=128, verbose=2)
print(accuracy2)
21/21 - 1s - loss: 3.4147 - accuracy: 0.5235 - 953ms/epoch - 45ms/step
0.5234615206718445

Observations and insights: The batch normalization layer appears to be making the validation accuracy highly unstable across epochs. For this particular model, batch normalization may not be the appropriate regularization technique -- a dropout layer may be more effectual.

Generate the classification report and confusion matrix

In [64]:
y_predictions = model2.predict(X_test, batch_size=128, verbose=2)
y_predictions = np.argmax(y_predictions, axis=1) # need to get the highest probability indices along axis 1 (row axis)

conf_matrix_model2 = tf.math.confusion_matrix(y_test, y_predictions)
plt.figure(figsize=(12, 8))
xlabels = ['Parasitized', 'Uninfected']
ylabels = ['Parasitized', 'Uninfected']
sns.heatmap(conf_matrix_model0, annot=True, fmt= '.0f', xticklabels=xlabels, yticklabels=ylabels)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
21/21 - 0s - 168ms/epoch - 8ms/step

Think About It :

  • Can we improve the model with Image Data Augmentation?
    • Absolutely. Data augmentation will make the model more robust to variations naturally seen in real-world scenarios and datasets beyond the one available to us here.
  • References to image data augmentation can be seen below:
    • Image Augmentation for Computer Vision
    • How to Configure Image Data Augmentation in Keras?

Model 3 with Data Augmentation¶

In [65]:
reset_session()

Use image data generator

In [66]:
# Create data generator
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator( # for use in model, image data generator for augmentation
    rotation_range=180,
    shear_range=25,
    zoom_range=[0.8, 1.2],
    brightness_range=[0.5, 1.5],
    fill_mode='nearest'
)

Think About It :

  • Check if the performance of the model can be improved by changing different parameters in the ImageDataGenerator.
    • The model may be improved by tuning parameters in the ImageDataGenerator, however the best model found is fairly simple, with only image rotation applied. Shearing, zooming, brightening and darkening, and a host of other modifications were applied to the image, but these only served to decrease accuracy on the test data.

Visualizing Augmented images

In [67]:
train_labels_ser = pd.Series.map(pd.Series(y_train), {0: 'Parasitized', 1: 'Uninfected'}) # for labeling in plot
In [68]:
datagen_test = ImageDataGenerator(
    rotation_range=180,
    fill_mode='nearest')

img = datagen_test.flow(X_train/255, batch_size=6)
plt.figure(figsize=(12,8))

for i in range(6):
  plt.subplot(2, 3, i+1)
  batch = img.next()
  aug_img = batch[0]
  plt.imshow(aug_img)
  plt.xlabel(train_labels_ser[i])

plt.show()
In [69]:
datagen_test = ImageDataGenerator(
    horizontal_flip=True,
    vertical_flip=True,
    fill_mode='nearest',
)

img = datagen_test.flow(X_train/255, batch_size=6)
plt.figure(figsize=(12,8))

for i in range(6):
  plt.subplot(2, 3, i+1)
  batch = img.next()
  aug_img = batch[0]
  plt.imshow(aug_img)
  plt.xlabel(train_labels_ser[i])

plt.show()
In [70]:
datagen_test = ImageDataGenerator(
    width_shift_range=[-16, 16],
    fill_mode='nearest',
)

img = datagen_test.flow(X_train/255, batch_size=6)
plt.figure(figsize=(12,8))

for i in range(6):
  plt.subplot(2, 3, i+1)
  batch = img.next()
  aug_img = batch[0]
  plt.imshow(aug_img)
  plt.xlabel(train_labels_ser[i])

plt.show()
In [71]:
datagen_test = ImageDataGenerator(
    shear_range=40,
    fill_mode='nearest',
)

img = datagen_test.flow(X_train/255, batch_size=6)
plt.figure(figsize=(12,8))

for i in range(6):
  plt.subplot(2, 3, i+1)
  batch = img.next()
  aug_img = batch[0]
  plt.imshow(aug_img)
  plt.xlabel(train_labels_ser[i])

plt.show()
In [72]:
datagen_test = ImageDataGenerator(
    zoom_range=[0.5, 1.5],
    fill_mode='nearest',
)

img = datagen_test.flow(X_train, batch_size=1)
plt.figure(figsize=(12,8))

for i in range(6):
  plt.subplot(2, 3, i+1)
  batch = img.next()
  aug_img = batch[0].astype('uint8')
  plt.imshow(aug_img/255)
  plt.xlabel(train_labels_ser[i])

plt.show()

Observations and insights: Some functions may be meaningful to use in data augmentation, such as the shear, rotation, width shift, and zoom functions. However, if these are not parametrized correctly, it may end up with images that are not reflected in reality under any conditions. Correct parametrization will certainly make the model more robust to variations from data in the real world.

Building the Model

In [73]:
model3 = Sequential([ 
    Rescaling(1./255, input_shape=(IMG_SIZE, IMG_SIZE, 3)), #use this layer for normalization of freshly imported dataset
    Conv2D(16, 3, padding='same', activation='tanh'),
    MaxPooling2D((2, 2)),
    Conv2D(32, 3, padding='same'),
    LeakyReLU(0.1),
    MaxPooling2D((2, 2)),
    Conv2D(16, 3, padding='same'),
    LeakyReLU(0.1),
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.1),
    Dense(64, activation='relu'),
    Dense(2, activation='softmax') # set this to 2 because we have 2 classes
])
In [74]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
model3.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model3.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 rescaling (Rescaling)       (None, 64, 64, 3)         0         
                                                                 
 conv2d (Conv2D)             (None, 64, 64, 16)        448       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 32, 32, 16)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        4640      
                                                                 
 leaky_re_lu (LeakyReLU)     (None, 32, 32, 32)        0         
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 16, 16, 32)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 16, 16, 16)        4624      
                                                                 
 leaky_re_lu_1 (LeakyReLU)   (None, 16, 16, 16)        0         
                                                                 
 flatten (Flatten)           (None, 4096)              0         
                                                                 
 dense (Dense)               (None, 128)               524416    
                                                                 
 dropout (Dropout)           (None, 128)               0         
                                                                 
 dense_1 (Dense)             (None, 64)                8256      
                                                                 
 dense_2 (Dense)             (None, 2)                 130       
                                                                 
=================================================================
Total params: 542,514
Trainable params: 542,514
Non-trainable params: 0
_________________________________________________________________

Using Callbacks

In [75]:
# create model checkpoint, filepath
# model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3-{epoch:02d}-{val_accuracy:.4f}.hdf5'
model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5'
callbacks = [
    EarlyStopping(monitor='val_loss', mode='min', patience=5, verbose=1),
    ModelCheckpoint(model_path, monitor='val_accuracy', mode='max', save_best_only=True, verbose=1)
]

Fit and Train the model

In [76]:
epochs = 30 # as the best model, will train this for more epochs than others
history_3 = model3.fit(datagen.flow(
    X_train, y_train, batch_size=128), 
    validation_data=datagen.flow(X_val, y_val, batch_size=128),
    epochs=epochs, 
    callbacks=callbacks,
)
Epoch 1/30
156/156 [==============================] - ETA: 0s - loss: 0.6496 - accuracy: 0.6183
Epoch 1: val_accuracy improved from -inf to 0.73532, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5
156/156 [==============================] - 36s 215ms/step - loss: 0.6496 - accuracy: 0.6183 - val_loss: 0.5622 - val_accuracy: 0.7353
Epoch 2/30
156/156 [==============================] - ETA: 0s - loss: 0.3954 - accuracy: 0.8290
Epoch 2: val_accuracy improved from 0.73532 to 0.91585, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5
156/156 [==============================] - 34s 215ms/step - loss: 0.3954 - accuracy: 0.8290 - val_loss: 0.2117 - val_accuracy: 0.9158
Epoch 3/30
156/156 [==============================] - ETA: 0s - loss: 0.2011 - accuracy: 0.9275
Epoch 3: val_accuracy improved from 0.91585 to 0.94069, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5
156/156 [==============================] - 34s 218ms/step - loss: 0.2011 - accuracy: 0.9275 - val_loss: 0.1660 - val_accuracy: 0.9407
Epoch 4/30
156/156 [==============================] - ETA: 0s - loss: 0.1430 - accuracy: 0.9497
Epoch 4: val_accuracy improved from 0.94069 to 0.96113, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5
156/156 [==============================] - 34s 217ms/step - loss: 0.1430 - accuracy: 0.9497 - val_loss: 0.1265 - val_accuracy: 0.9611
Epoch 5/30
156/156 [==============================] - ETA: 0s - loss: 0.1288 - accuracy: 0.9565
Epoch 5: val_accuracy improved from 0.96113 to 0.96153, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5
156/156 [==============================] - 34s 219ms/step - loss: 0.1288 - accuracy: 0.9565 - val_loss: 0.1051 - val_accuracy: 0.9615
Epoch 6/30
156/156 [==============================] - ETA: 0s - loss: 0.1161 - accuracy: 0.9617
Epoch 6: val_accuracy improved from 0.96153 to 0.96173, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5
156/156 [==============================] - 34s 216ms/step - loss: 0.1161 - accuracy: 0.9617 - val_loss: 0.1206 - val_accuracy: 0.9617
Epoch 7/30
156/156 [==============================] - ETA: 0s - loss: 0.1141 - accuracy: 0.9637
Epoch 7: val_accuracy improved from 0.96173 to 0.96975, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5
156/156 [==============================] - 33s 215ms/step - loss: 0.1141 - accuracy: 0.9637 - val_loss: 0.0999 - val_accuracy: 0.9697
Epoch 8/30
156/156 [==============================] - ETA: 0s - loss: 0.0988 - accuracy: 0.9680
Epoch 8: val_accuracy improved from 0.96975 to 0.97155, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5
156/156 [==============================] - 33s 214ms/step - loss: 0.0988 - accuracy: 0.9680 - val_loss: 0.0864 - val_accuracy: 0.9715
Epoch 9/30
156/156 [==============================] - ETA: 0s - loss: 0.0935 - accuracy: 0.9694
Epoch 9: val_accuracy did not improve from 0.97155
156/156 [==============================] - 33s 211ms/step - loss: 0.0935 - accuracy: 0.9694 - val_loss: 0.0995 - val_accuracy: 0.9687
Epoch 10/30
156/156 [==============================] - ETA: 0s - loss: 0.0892 - accuracy: 0.9720
Epoch 10: val_accuracy improved from 0.97155 to 0.97275, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5
156/156 [==============================] - 34s 215ms/step - loss: 0.0892 - accuracy: 0.9720 - val_loss: 0.0859 - val_accuracy: 0.9728
Epoch 11/30
156/156 [==============================] - ETA: 0s - loss: 0.0893 - accuracy: 0.9720
Epoch 11: val_accuracy improved from 0.97275 to 0.97576, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5
156/156 [==============================] - 34s 215ms/step - loss: 0.0893 - accuracy: 0.9720 - val_loss: 0.0792 - val_accuracy: 0.9758
Epoch 12/30
156/156 [==============================] - ETA: 0s - loss: 0.0856 - accuracy: 0.9721
Epoch 12: val_accuracy did not improve from 0.97576
156/156 [==============================] - 33s 214ms/step - loss: 0.0856 - accuracy: 0.9721 - val_loss: 0.0886 - val_accuracy: 0.9724
Epoch 13/30
156/156 [==============================] - ETA: 0s - loss: 0.0838 - accuracy: 0.9737
Epoch 13: val_accuracy did not improve from 0.97576
156/156 [==============================] - 34s 215ms/step - loss: 0.0838 - accuracy: 0.9737 - val_loss: 0.0797 - val_accuracy: 0.9750
Epoch 14/30
156/156 [==============================] - ETA: 0s - loss: 0.0831 - accuracy: 0.9737
Epoch 14: val_accuracy did not improve from 0.97576
156/156 [==============================] - 33s 214ms/step - loss: 0.0831 - accuracy: 0.9737 - val_loss: 0.0836 - val_accuracy: 0.9715
Epoch 15/30
156/156 [==============================] - ETA: 0s - loss: 0.0805 - accuracy: 0.9740
Epoch 15: val_accuracy did not improve from 0.97576
156/156 [==============================] - 33s 213ms/step - loss: 0.0805 - accuracy: 0.9740 - val_loss: 0.0808 - val_accuracy: 0.9734
Epoch 16/30
156/156 [==============================] - ETA: 0s - loss: 0.0833 - accuracy: 0.9745
Epoch 16: val_accuracy improved from 0.97576 to 0.97676, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model3_checkpoints/model3.hdf5
156/156 [==============================] - 35s 226ms/step - loss: 0.0833 - accuracy: 0.9745 - val_loss: 0.0711 - val_accuracy: 0.9768
Epoch 17/30
156/156 [==============================] - ETA: 0s - loss: 0.0797 - accuracy: 0.9725
Epoch 17: val_accuracy did not improve from 0.97676
156/156 [==============================] - 33s 212ms/step - loss: 0.0797 - accuracy: 0.9725 - val_loss: 0.0802 - val_accuracy: 0.9756
Epoch 18/30
156/156 [==============================] - ETA: 0s - loss: 0.0761 - accuracy: 0.9735
Epoch 18: val_accuracy did not improve from 0.97676
156/156 [==============================] - 32s 208ms/step - loss: 0.0761 - accuracy: 0.9735 - val_loss: 0.0749 - val_accuracy: 0.9766
Epoch 19/30
156/156 [==============================] - ETA: 0s - loss: 0.0789 - accuracy: 0.9752
Epoch 19: val_accuracy did not improve from 0.97676
156/156 [==============================] - 32s 208ms/step - loss: 0.0789 - accuracy: 0.9752 - val_loss: 0.0767 - val_accuracy: 0.9748
Epoch 20/30
156/156 [==============================] - ETA: 0s - loss: 0.0759 - accuracy: 0.9751
Epoch 20: val_accuracy did not improve from 0.97676
156/156 [==============================] - 33s 212ms/step - loss: 0.0759 - accuracy: 0.9751 - val_loss: 0.0876 - val_accuracy: 0.9754
Epoch 21/30
156/156 [==============================] - ETA: 0s - loss: 0.0765 - accuracy: 0.9750
Epoch 21: val_accuracy did not improve from 0.97676
156/156 [==============================] - 33s 209ms/step - loss: 0.0765 - accuracy: 0.9750 - val_loss: 0.0792 - val_accuracy: 0.9738
Epoch 21: early stopping

Evaluating the model

In [126]:
loss3, accuracy3 = model3.evaluate(test_ds, batch_size=128, verbose=2)
print(accuracy3)
21/21 - 1s - loss: 0.0482 - accuracy: 0.9850 - 904ms/epoch - 43ms/step
0.9850000143051147

Plot the train and validation accuracy

In [88]:
plt.plot(history_3.history['accuracy'])
plt.plot(history_3.history['val_accuracy'])

plt.legend(['Train', 'Validation'])

plt.title('Accuracy vs. Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')

plt.show()

Plotting the classification report and confusion matrix

In [89]:
y_predictions = model3.predict(X_test, batch_size=128, verbose=2)
y_predictions = np.argmax(y_predictions, axis=1) # need to get the highest probability indices along axis 1 (row axis)
21/21 - 0s - 84ms/epoch - 4ms/step
In [90]:
conf_matrix_model3 = tf.math.confusion_matrix(y_test, y_predictions)
plt.figure(figsize=(12, 8))
xlabels = ['Parasitized', 'Uninfected']
ylabels = ['Parasitized', 'Uninfected']
sns.heatmap(conf_matrix_model3, annot=True, fmt= '.0f', xticklabels=xlabels, yticklabels=ylabels)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
In [91]:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred=y_predictions))
              precision    recall  f1-score   support

         0.0       0.98      0.99      0.99      1300
         1.0       0.99      0.98      0.98      1300

    accuracy                           0.98      2600
   macro avg       0.99      0.98      0.98      2600
weighted avg       0.99      0.98      0.98      2600

All around, this model with data augmentation is performing excellently. Accuracy is higher than any other model at 98% overall.We can see that for Uninfected instances, it is more prone to false negatives than false positives (higher precision than recall). Likewise, for Parasitized instances, it is more prone to false positives than negative (lower precision than recall). This is actually ideal, because in the case of malaria diagnosis, which is a health issue, it is more beneficial to have a false positive than a false negative in parasitized instances.

Now, let us try to use a pretrained model like VGG16 and check how it performs on our data.

Pre-trained model (VGG16)¶

  • Import VGG16 network upto any layer you choose
  • Add Fully Connected Layers on top of it
In [92]:
from tensorflow.keras.applications.vgg16 import VGG16
model_vgg = VGG16(include_top=False, weights='imagenet', input_shape=(IMG_SIZE, IMG_SIZE, 3))
model_vgg.trainable = False
model_vgg.summary()
Model: "vgg16"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 64, 64, 3)]       0         
                                                                 
 block1_conv1 (Conv2D)       (None, 64, 64, 64)        1792      
                                                                 
 block1_conv2 (Conv2D)       (None, 64, 64, 64)        36928     
                                                                 
 block1_pool (MaxPooling2D)  (None, 32, 32, 64)        0         
                                                                 
 block2_conv1 (Conv2D)       (None, 32, 32, 128)       73856     
                                                                 
 block2_conv2 (Conv2D)       (None, 32, 32, 128)       147584    
                                                                 
 block2_pool (MaxPooling2D)  (None, 16, 16, 128)       0         
                                                                 
 block3_conv1 (Conv2D)       (None, 16, 16, 256)       295168    
                                                                 
 block3_conv2 (Conv2D)       (None, 16, 16, 256)       590080    
                                                                 
 block3_conv3 (Conv2D)       (None, 16, 16, 256)       590080    
                                                                 
 block3_pool (MaxPooling2D)  (None, 8, 8, 256)         0         
                                                                 
 block4_conv1 (Conv2D)       (None, 8, 8, 512)         1180160   
                                                                 
 block4_conv2 (Conv2D)       (None, 8, 8, 512)         2359808   
                                                                 
 block4_conv3 (Conv2D)       (None, 8, 8, 512)         2359808   
                                                                 
 block4_pool (MaxPooling2D)  (None, 4, 4, 512)         0         
                                                                 
 block5_conv1 (Conv2D)       (None, 4, 4, 512)         2359808   
                                                                 
 block5_conv2 (Conv2D)       (None, 4, 4, 512)         2359808   
                                                                 
 block5_conv3 (Conv2D)       (None, 4, 4, 512)         2359808   
                                                                 
 block5_pool (MaxPooling2D)  (None, 2, 2, 512)         0         
                                                                 
=================================================================
Total params: 14,714,688
Trainable params: 0
Non-trainable params: 14,714,688
_________________________________________________________________
In [93]:
# REF: https://towardsdatascience.com/transfer-learning-with-vgg16-and-keras-50ea161580b4 
model4 = Sequential([ 
    model_vgg,
    Rescaling(1./255, input_shape=(IMG_SIZE, IMG_SIZE, 3)), #use this layer for normalization of freshly imported dataset
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.1),
    Dense(64, activation='relu'),
    Dense(2, activation='softmax') # set this to 2 because we have 2 classes
])

Compiling the model

In [94]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
model4.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model4.summary()
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 vgg16 (Functional)          (None, 2, 2, 512)         14714688  
                                                                 
 rescaling_2 (Rescaling)     (None, 2, 2, 512)         0         
                                                                 
 flatten_2 (Flatten)         (None, 2048)              0         
                                                                 
 dense_6 (Dense)             (None, 128)               262272    
                                                                 
 dropout_2 (Dropout)         (None, 128)               0         
                                                                 
 dense_7 (Dense)             (None, 64)                8256      
                                                                 
 dense_8 (Dense)             (None, 2)                 130       
                                                                 
=================================================================
Total params: 14,985,346
Trainable params: 270,658
Non-trainable params: 14,714,688
_________________________________________________________________

Using callbacks

In [95]:
# create model checkpoint, filepath
# model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4-{epoch:02d}-{val_accuracy:.4f}.hdf5'
model_path = '/content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4.hdf5'
callbacks = [
    EarlyStopping(monitor='val_loss', mode='min', patience=5, verbose=1),
    ModelCheckpoint(model_path, monitor='val_accuracy', mode='max', save_best_only=True, verbose=1)
]

Fit and Train the model

In [96]:
epochs = 15
history_4 = model4.fit(datagen.flow(
    X_train, y_train, batch_size=128), 
    validation_data=datagen.flow(X_val, y_val, batch_size=128),
    epochs=epochs, 
    callbacks=callbacks,
)
Epoch 1/15
156/156 [==============================] - ETA: 0s - loss: 0.2338 - accuracy: 0.9187
Epoch 1: val_accuracy improved from -inf to 0.94330, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4.hdf5
156/156 [==============================] - 35s 218ms/step - loss: 0.2338 - accuracy: 0.9187 - val_loss: 0.1608 - val_accuracy: 0.9433
Epoch 2/15
156/156 [==============================] - ETA: 0s - loss: 0.1563 - accuracy: 0.9447
Epoch 2: val_accuracy improved from 0.94330 to 0.94470, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4.hdf5
156/156 [==============================] - 34s 216ms/step - loss: 0.1563 - accuracy: 0.9447 - val_loss: 0.1538 - val_accuracy: 0.9447
Epoch 3/15
156/156 [==============================] - ETA: 0s - loss: 0.1428 - accuracy: 0.9493
Epoch 3: val_accuracy improved from 0.94470 to 0.94570, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4.hdf5
156/156 [==============================] - 34s 216ms/step - loss: 0.1428 - accuracy: 0.9493 - val_loss: 0.1460 - val_accuracy: 0.9457
Epoch 4/15
156/156 [==============================] - ETA: 0s - loss: 0.1420 - accuracy: 0.9478
Epoch 4: val_accuracy improved from 0.94570 to 0.95352, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4.hdf5
156/156 [==============================] - 34s 220ms/step - loss: 0.1420 - accuracy: 0.9478 - val_loss: 0.1296 - val_accuracy: 0.9535
Epoch 5/15
156/156 [==============================] - ETA: 0s - loss: 0.1370 - accuracy: 0.9507
Epoch 5: val_accuracy did not improve from 0.95352
156/156 [==============================] - 33s 210ms/step - loss: 0.1370 - accuracy: 0.9507 - val_loss: 0.1316 - val_accuracy: 0.9487
Epoch 6/15
156/156 [==============================] - ETA: 0s - loss: 0.1317 - accuracy: 0.9511
Epoch 6: val_accuracy improved from 0.95352 to 0.95612, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4.hdf5
156/156 [==============================] - 34s 218ms/step - loss: 0.1317 - accuracy: 0.9511 - val_loss: 0.1194 - val_accuracy: 0.9561
Epoch 7/15
156/156 [==============================] - ETA: 0s - loss: 0.1260 - accuracy: 0.9552
Epoch 7: val_accuracy improved from 0.95612 to 0.96013, saving model to /content/drive/MyDrive/MIT ADSP Capstone/model4_checkpoints/model4.hdf5
156/156 [==============================] - 35s 221ms/step - loss: 0.1260 - accuracy: 0.9552 - val_loss: 0.1129 - val_accuracy: 0.9601
Epoch 8/15
156/156 [==============================] - ETA: 0s - loss: 0.1241 - accuracy: 0.9537
Epoch 8: val_accuracy did not improve from 0.96013
156/156 [==============================] - 33s 213ms/step - loss: 0.1241 - accuracy: 0.9537 - val_loss: 0.1226 - val_accuracy: 0.9531
Epoch 9/15
156/156 [==============================] - ETA: 0s - loss: 0.1255 - accuracy: 0.9549
Epoch 9: val_accuracy did not improve from 0.96013
156/156 [==============================] - 32s 208ms/step - loss: 0.1255 - accuracy: 0.9549 - val_loss: 0.1137 - val_accuracy: 0.9593
Epoch 10/15
156/156 [==============================] - ETA: 0s - loss: 0.1229 - accuracy: 0.9559
Epoch 10: val_accuracy did not improve from 0.96013
156/156 [==============================] - 33s 209ms/step - loss: 0.1229 - accuracy: 0.9559 - val_loss: 0.1285 - val_accuracy: 0.9491
Epoch 11/15
156/156 [==============================] - ETA: 0s - loss: 0.1264 - accuracy: 0.9531
Epoch 11: val_accuracy did not improve from 0.96013
156/156 [==============================] - 33s 210ms/step - loss: 0.1264 - accuracy: 0.9531 - val_loss: 0.1207 - val_accuracy: 0.9579
Epoch 12/15
156/156 [==============================] - ETA: 0s - loss: 0.1190 - accuracy: 0.9579
Epoch 12: val_accuracy did not improve from 0.96013
156/156 [==============================] - 33s 209ms/step - loss: 0.1190 - accuracy: 0.9579 - val_loss: 0.1242 - val_accuracy: 0.9579
Epoch 12: early stopping

Plot the train and validation accuracy

In [97]:
plt.plot(history_4.history['accuracy'])
plt.plot(history_4.history['val_accuracy'])

plt.legend(['Train', 'Validation'])

plt.title('Accuracy vs. Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')

plt.show()

Observations and insights:

  • What can be observed from the validation and train curves?

The model starts out with a very high accuracy, but it cannot reach the peak that the previous model was able to achieve (though it is close). This may be because the model is in a sense "over-regularized" and may need more layers on top of the base VGG model to extract more features and perform computation on them. However, this was not done due to computing resource constraints.

Evaluating the model

In [128]:
loss4, accuracy4 = model4.evaluate(test_ds, batch_size=128, verbose=2)
print(accuracy4)
21/21 - 1s - loss: 0.1020 - accuracy: 0.9615 - 1s/epoch - 60ms/step
0.9615384340286255

Plotting the classification report and confusion matrix

In [99]:
y_predictions = model4.predict(X_test, batch_size=128, verbose=2)
y_predictions = np.argmax(y_predictions, axis=1) # need to get the highest probability indices along axis 1 (row axis)
21/21 - 0s - 467ms/epoch - 22ms/step
In [102]:
conf_matrix_model4 = tf.math.confusion_matrix(y_test, y_predictions)
plt.figure(figsize=(12, 8))
xlabels = ['Parasitized', 'Uninfected']
ylabels = ['Parasitized', 'Uninfected']
sns.heatmap(conf_matrix_model4, annot=True, fmt= '.0f', xticklabels=xlabels, yticklabels=ylabels)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
In [101]:
print(classification_report(y_test, y_pred=y_predictions))
              precision    recall  f1-score   support

         0.0       0.97      0.95      0.96      1300
         1.0       0.95      0.97      0.96      1300

    accuracy                           0.96      2600
   macro avg       0.96      0.96      0.96      2600
weighted avg       0.96      0.96      0.96      2600

Think about it:

  • What observations and insights can be drawn from the confusion matrix and classification report?

    • Note: 0 is Parasitized, 1 is Uninfected
    • We can see that this model is overall 97% accurate. This model is all around very strong, however, we can see that for Uninfected instances, it is more prone to false negatives than false positives (higher precision than recall). Likewise, for Parasitized instances, it is more prone to false positives than negative (lower precision than recall). This is actually ideal, since due to the nature of the problem statement, which is a health issue, it is more beneficial to have a false positive than a false negative in parasitized instances.
  • Choose the model with the best accuracy scores from all the above models and save it as a final model.

    • The best model is model3, the model prior to the one directly above.

A quick comparison of all the models:¶

In [133]:
# Best model was saved as part of callback method for model 3, with data augmentation.
model_df = pd.DataFrame({'Model 0': [accuracy0, loss0],
                         'Model 1': [accuracy1, loss1],
                         'Model 2': [accuracy2, loss2],
                         'Model 3': [accuracy3, loss3],
                         'Model 4': [accuracy4, loss4]}, index=['Accuracy', 'Loss'])
model_df
Out[133]:
Model 0 Model 1 Model 2 Model 3 Model 4
Accuracy 0.966538 0.973846 0.523462 0.985000 0.961538
Loss 0.096982 0.080863 3.414724 0.048173 0.102036

Observations and Conclusions drawn from the final model:

We can see clearly that Model 3, with data augmentation and the dropout layer, performs the best, even better than Model 4 (the transfer learning VGG16 model), all while being significantly smaller (~540K parameters vs. ~15M parameters).

Also stated above: All around, this model with data augmentation is performing excellently. Accuracy is higher than any other model at 98% overall. We can see that for Uninfected instances, it is more prone to false negatives than false positives (higher precision than recall). Likewise, for Parasitized instances, it is more prone to false positives than negative (lower precision than recall). This is actually ideal, because in the case of malaria diagnosis, which is a health issue, it is more beneficial to have a false positive than a false negative in parasitized instances.

Improvements that can be done:

  • Can the model performance be improved using other pre-trained models or different CNN architecture?
    • Yes, the model may improve through using some kind of pre-trained model or different architecture. It may be a matter of altering the learning rate, or of changing an activation function in the model. Or perhaps adding/removing certain layers in exchange for others. More testing and more data would be helpful in getting accuracy to higher than 98%.
    • Modeling the images as HSV's may also be useful. Additionally, data augmentation strategies such as gaussian blurring may be adopted with a slightly different pipeline. For instance, the Albumentations module has more potential transformations for the images than the native tensorflow ImageDataGenerator module (ref: https://albumentations.ai/).

Insights

Refined insights:

  • What are the most meaningful insights from the data relevant to the problem?
    • Cells infected with malaria have certain visual cues and features that mark infection, however in some cases these are subtle and quite difficult to detect. This is where a neural network and machine learning techniques can come in handy -- they might be able to see particular features that a doctor's human eye cannot see.

Comparison of various techniques and their relative performance:

  • How do different techniques perform? Which one is performing relatively better? Is there scope to improve the performance further?
    • Data augmentation and regularization techniques within a convolutional neural network (CNN) are performing better than CNNs without these techniques. Interestingly, a batch normalization layer appears to be detrimental for performance -- this may be due to not running enough epochs to allow validation accuracy and the gradients within the model to stabilize. More epochs, and more time spent computing may improve performance. Additionally, modeling with more data augmentation methods, as mentioned in the previous cell with the Albumentations module, may be helpful for performance.
    • The performance of the model could be slightly better, up to 99% or even perhaps 100% accuracy. Techniques mentioned prior may help with this, however it would be more fruitful to acquire data from a host of different sources, such as different hospitals and organizations with labeled data. This dataset appeared to have come from one source given how standardized it is.

Proposal for the final solution design:

  • What model do you propose to be adopted? Why is this the best solution to adopt?
    • Model 3, with data augmentation and a small dropout layer for regularization, is the best solution and performs the best on the dataset. It has the highest recall scores, and thus will minimize false negatives, which is highly desirable for the health problem this model aims to help solve. It is also a relatively simple model, at only 542,514 parameters. This means that the model is fast, and highly deployable when data and spatial constraints are considered. Its simplicity is its strength, since it might be easily modified and upgraded on the ground, where this model is to be used. In places with less developed technology, sparse internet access, and lower computing resources overall, this model is light, compact, and simple enough to be fast and easily modifiable and retrained, but complex enough to deliver high accuracy and high performance.